The data which was pulled from CPD data warehouse has a format that looks like:
## DATEOCC YEAR MONTH DAY DOW CURR_IUCR FBI_CD AREA BEAT DISTRICT
## 1 2008-01-01 2008 1 1 Tue 610 5 2 623 6
## 2 2008-01-01 2008 1 1 Tue 610 5 5 1421 14
## 3 2008-01-01 2008 1 1 Tue 610 5 1 824 8
## X_COORD Y_COORD LOCATION INC_CNT
## 1 1179897 1852178 210 1
## 2 1157184 1911751 210 1
## 3 1159285 1867501 290 1
A preview of the variables
## 'data.frame': 161738 obs. of 14 variables:
## $ DATEOCC : Date, format: "2008-01-01" "2008-01-01" ...
## $ YEAR : int 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...
## $ MONTH : int 1 1 1 1 1 1 1 1 1 1 ...
## $ DAY : int 1 1 1 1 1 1 1 1 1 1 ...
## $ DOW : Factor w/ 7 levels "Sun","Mon","Tue",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ CURR_IUCR: Factor w/ 4 levels "610","620","630",..: 1 1 1 1 2 1 1 1 2 1 ...
## $ FBI_CD : Factor w/ 1 level "5": 1 1 1 1 1 1 1 1 1 1 ...
## $ AREA : Factor w/ 5 levels "1","2","3","4",..: 2 5 1 5 1 4 5 2 1 2 ...
## $ BEAT : Factor w/ 301 levels "111","112","113",..: 68 174 97 178 112 168 293 35 96 265 ...
## $ DISTRICT : Factor w/ 26 levels "1","2","3","4",..: 6 14 8 14 9 13 25 3 8 22 ...
## $ X_COORD : int 1179897 1157184 1159285 1158917 1170421 1159992 1150997 1187783 1158765 1172773 ...
## $ Y_COORD : int 1852178 1911751 1867501 1916149 1881971 1901896 1919848 1855603 1862371 1835967 ...
## $ LOCATION : Factor w/ 83 levels "","090","092",..: 40 40 63 40 63 39 40 16 63 52 ...
## $ INC_CNT : int 1 1 1 1 1 1 1 1 1 1 ...
A summary of the data
summary(BurglaryData)
## DATEOCC YEAR MONTH DAY
## Min. :2008-01-01 Min. :2008 Min. : 1.000 Min. : 1.00
## 1st Qu.:2009-07-30 1st Qu.:2009 1st Qu.: 4.000 1st Qu.: 8.00
## Median :2011-01-22 Median :2011 Median : 7.000 Median :16.00
## Mean :2011-03-06 Mean :2011 Mean : 6.812 Mean :15.89
## 3rd Qu.:2012-08-31 3rd Qu.:2012 3rd Qu.:10.000 3rd Qu.:23.00
## Max. :2014-12-31 Max. :2014 Max. :12.000 Max. :31.00
##
## DOW CURR_IUCR FBI_CD AREA BEAT
## Sun:17727 610:108315 5:161738 1 :46520 421 : 1840
## Mon:25003 620: 43787 2 :48406 423 : 1675
## Tue:24858 630: 7431 3 :31206 414 : 1505
## Wed:24617 650: 2205 4 :11883 835 : 1481
## Thu:24642 5 :23719 831 : 1416
## Fri:25843 NA's: 4 (Other):153818
## Sat:19048 NA's : 3
## DISTRICT X_COORD Y_COORD LOCATION
## 8 : 14982 Min. :1099259 Min. :1813949 290 :58308
## 4 : 12674 1st Qu.:1153260 1st Qu.:1855402 090 :53206
## 25 : 11060 Median :1165290 Median :1873772 210 :23140
## 7 : 10787 Mean :1164957 Mean :1882216 330 : 4863
## 3 : 10504 3rd Qu.:1176635 3rd Qu.:1911560 261 : 2975
## (Other):101728 Max. :1205079 Max. :1951601 200 : 2442
## NA's : 3 (Other):16804
## INC_CNT
## Min. :1
## 1st Qu.:1
## Median :1
## Mean :1
## 3rd Qu.:1
## Max. :1
##
From the summary, we can already see burglaries have different weekly pattern as weekend has significantly less incidents than weekday. The majority of burglary type is 0610 which is Forcible Entry (0620: Unlawful Entry; 0630: Attempt Forcible Entry; 0650: Home Invasion). The top three location types are 290: Residence;090 Apartment;210 Residence Garage
The summary of how the crime counts are distributed in each area
##
## 1 2 3 4 5 <NA>
## 46520 48406 31206 11883 23719 4
and in each district
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 1389 4423 10504 12674 8391 10471 10787 14982 7914 5539 5756 3623
## 13 14 15 16 17 18 19 20 21 22 23 24
## 2953 8151 4659 5796 6033 3344 6410 2546 1771 6580 1393 4585
## 25 31 <NA>
## 11060 1 3
What needs to be noticed is District 31 only has 1 incidents during the 7 year period.
Most of the missing values (only in attribute AREA,DISTRICT and BEAT) have identical row indices.
From the shape files provided by CPD, the area, district and beat polygon map are shown below
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "area_bndy"
## with 8 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "district_bndy"
## with 28 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "beat_bndy"
## with 288 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
A scatter point plot of burglary locations for a certain day (2014-01-01)
Let’s first aggregate data by policing beat/district. Both of the plots below try to unveil if different districts have similar seasonal patterns or not.
The top plot shows the daily crime time series. Note that the series of district 13, 21, and 23 seem to be truncated. It turned out that data of distirct 13, 21 and 13 is only avaiable up to 2012/12/16, 2013/03/02, and 2013/03/01 respectively. For the bottom plot, the crime counts were first grouped in year and then aggregated by district and month.
Grouping by beat would present higher resolution view of spatial and temporal patterns. However, as we have nearly 300 beats, instead of using muit-panel plot, we resorted to heap map to show these patterns.
Again, some beats have strong decreaseing periodic seasonal trend while some others don’t. And the burglary counts in adjacent beats are usually close.
Now let’s shift from policing regional study to city-wide analysis. Here is an incident location plot for each month of year 2014.
It is difficult to examine if crime location clusters are time-varying just by looking at the point plots. Let’s move to grid(pixel)-based analysis. First, the point data was rasterized through binning into a 100 \(\times\) 100 grid (the boundaries were defined by the range of x-coordinate and y-coordinate from all available crime locations plus a margin of 1000 unit on each side). Here shows an example of pixelized violent crime locations in January 2014.
Next, we do kernel density estimation (KDE) of the monthly aggregation over each year. The kernel applied here is a 2D Gaussian kernel with the same bandwidth in each direction. The bandwidth was selected through (minimizing MSE) cross-valiation using all available data (08-14).
Here displays an animation of KDE for each year (08-14).